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Summary 

The complete 9193-nucleotide sequence of the prob- 
able causative agent of AIDS, lymphadenopathy-asso- 
ciated virus (LAV), has been determined. The deduced 
genetic structure Is unique: it shows In addition to the 
retroviral gag, pol, and env^m^^^^ open 
reading frames we call Q and F. Re™ i^fe: 
cated between poi and env and H ^ 
the U3 element of the LTR^™^ LAV apart 

■from the, previously ^^^ir^^miiir ' of human^ 
>T ceil leukemia^ymphoma ; , ; v?^;'^ 

--*.'•■. " ' ' . -C^V^'y^M^^^ 
Introduction * - v -*' * ' 1 - i ---- i >----" 

The recent onset of severe opportunistic infections among 
previously healthy malehomosexuals has led to the char-, 
acterization of the acquired immunodeficiency syndrome ' 
(AIDS) (Gottlieb et a.;ft981; Masur et al.. 1981). The dis- * 
ease has spread dramatically, arid new high-risk groups 
have been identified: patients receiving blood products, * 
intravenous drug addicts, and individuals originating from 
Haiti and Central Africa (Plot et ah, 1984). AIDS is a fatal 
disease, and there is at present no specific treatment. The 
causative agent was suspected to be of viral origin since 
- tho ssi c' crniciogisat pat *~ m n« Afnq-y.f!«r mnr i s te n t y yith 



a transmissible disease, and cases had been reported af 
ter treatment involving uftrafiltered anti-hemophilia prepa- 
rations (Daly and Scott, 1983). A decisive step in AIDS re- 
search was the discovery of a novel human retrovirus 
called lymphadenopathy-associated virus (LAV) (Barre- 
Sinoussi et al., 1983). The properties of the virus consis- 
tent with its etiological role in AIDS are: the recovery of 
many independent isolates from patients with AIDS or 
related diseases (Montagnier et al., 1984); high LAV 
seropositive among these population 8 ; (Brun-Vezinet et 
al.. 1984); a tropism and cytopathic effect in vitro for the 
helper/inducer T-lymphocyte subset T4 (Klatzmann et ai., 
1984), also found depleted in vivo. 

Other groups have reported the isolation of human 
retroviruses, the human T cell leukemia/lymphoma/lym- 
photropic virus type III (HTLV-III) (Pbpovic et al.. 1984) and 
the AIDS-associated retrovirus (ARV), which display bio- 
logical and sero-epidemiological properties very similar to 
if not identical with those of LAV (Levy et al. , 1984; Pbpovic 
et al., 1984; Schupbach et al.. 1984). Both LAV and HTLV- 



III genomes have been molecularly cloned (Alizon et al., 
1984; Hahn et al., 1984). Their restriction maps show 
remarkable agreement, including a Hind III restriction site 
polymorphism, bearing in mind the variability of this virus 
(Shaw et al., 1984) and confirming that these two viruses 
represent a single viral lineaga 

In addition to its obvious diagnostic and therapeutic 
potential, the LAV DNA nucleotide sequence is' essential 
to an understanding of the genetics and molecular biology 
of the virus and its classification among retroviruses; We 
report here the complete 9193-nucleotide sequence of the 
LAV genome established from cloned proviral DNA. 

Results 

DNA Sequence and Organization of the LAV Genome 
.We have reported previously the molecular cloning of both 
} r^cONA and integrated proviral forms of LAV (Al&n;et- al:?ff 
V"^? 84 ^ The recombinant phage clones were \9^S^£f^nP^- 
^ genomic library of LAV-infected human ;T4yrnphb^r^' 

partially digested bjjrHiha^ 
v^binant phage pw was 9eneralGd|by^ 
i;^? 1 !" ^ e element of the (Qhgttem^ 
y^i jpus each extremity of the insert c»iMr^p^e^ar^ffie%^ 
LTR. We have eliminated the possibility of clustered Hind'' 
I III sites within R by sequencing part yof an LAV cDNA 
, ^clone, pLAV 75 (Alizon et al., 198 j, corresponding to this { 
region (data not shown). Thus the total sequence' informal ; 
tion of the LAV genome can be derived from the JU19 % 
' clone. V" 

Using the M13 shotgun cloning arid: dideoxy chain ter- 
mination method (Sanger et al., 1977), we have deter- 
mined the nucleotide sequence of AJ19 insert. The recon- 
structed viral genome with two copies of the R sequence 
is 9193 nucleotides long. The numbering system starts at 
the cap site (see below) of virion RNA (Figure 1). 
- Tho vira! (i) etrand ccntainc tho ctaiu tsfy-ieSfi 



genes encoding the core structural proteins (gag), reverse 
transcriptase (pol), and envelope protein (env), and two 
extra open reading frames (orf ) that we call Q and F (Table 
1). The genetic organization of LAV, 5'LTR-gag-pol-Q-env- 
WLTR, is unique. Whereas in all replication-competent 
retroviruses pol and env genes overlap, in LAV they are 
separated by orf Q (192 amino acids) followed by tour 
small (<100 triplets) orf. The orf F (206 amino acids) 
slightly overlaps the 3 r end of env and is remarkable in that 
it is half-encoded by the U3 region of the LTR. 

Such a structure clearly places LAV apart from previ- 
ously sequenced retroyinjses (Figure 2). The H strand is 
apparently noncoding. The additional Hind III site of the 
LAV clone AJ81 (with respect to AJ19) maps to the appar- 
ently noncoding region between Q and env (positions 
5166-5745). Starting at position 5501 is a sequence 
(AAGCCT) that differs by a single base (underlined) from 
the Hind Hi recognition sequence. It is anticipated that 
many of the restriction site polymorphisms between differ- 
ent isolates will map to this region. 



Htndlll 

CCTCTCTCTa^AGACCAWTTTCACC^CWACCTCTCTG^ 

. ...... 100 

CTCACTCTCCTAACT AU(^TCCCTCAGACCCTTTTACTCACTCTCGAAAATCTC taccactc ccc c cccaac^ CCCACTTGAAACCCAAACCCAAACCACACCACCTCT ctccac c ca c 

200 ... 
GAG«» UuAUCluAUArgAraAr R CligtblyAl«ArRAUSerV<lLeuSer 

GACTCCCCTTGCTO »JCCCCCACCtt-*lC * r.«rft « -CCCAg(^ 

300 ..... 

CI ?C1 yCloUuAJ pAxtTrpGl uLya 1 1 eArtUuArtFroGl 
<XCCr^.4CA>rtAGATC(UTCa^AAAAATTCCCrtAACCC^ 

...... 

UuWuCl uTbrS* cQl uClyCyiAxtClo X I eUuCl yCioUuCUProS*rUuCloThr<;iyS«rCluCUL«uArBS«rUuTyrAinThrV«lAUThrUoTyrCy.V«lHii 
WCTCTTACAMCATCACAACCCTCTACACAAATACTCCCaCACCTACAACCATCC 

ATCAAACCATACAWTAJUAi^^ 

cccaaaattaccctatactccacaacatccacccccaaatcctacatcacmcatatcacctacaactttaaatc 

TACCCATCTTTTa(XAmTCACAACWACC(UCCCrACAACATTTAAAa 

900 , 



TC*C*TC*CTCttTC<^CeACCCCTCCICA(^IA*ACAtA«WK^ 

***** 2100 
CATAAACAAAAAACACACTACTAAATCCACAAAATTACTACATTTCACACAACTTAATAACACAACTCAACACTTCTCCCAAC^ 

... 2700 

* * • • 3300 

AC^Sffi^^ 
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UuClyIl*IWCiaAUGlQProAjpLy«StrCtuSerCluUoV«UfaCU 
AnAG^TCATTCAAGCACAACCAGATAAAAGTGAA 

3700 

ClyClyA*oClttClQV«lA* P LyiLeuV*lSeriJ*ClyIleArgl7^ 
TCGACCAMTCAACAA7TACAIAAATTACTCA(tICCTC(^ 

3800 .... 

AlaStrAjpPbeA*aUuProFroValV«lAlaX*»GlulltValAUSerCya^^ 
GCCTACTGATTTTAACCTCCCACCTCTACt Afy k A A ftfiAA ATi£TACCCACCTCTQlTAAATCTtIACCTAAAACCACAACCCATCCATCCACAA 

3900 ..... . 

UtiAjpCj.Thrfc..UuCluCljL r *V»lIUUuV*lAUV.lHiiV^^ 
^CTAGATTCTACACATTr AGAAG C AAAAG TTATCCTGCTACXAGTTCATCTAIXCACTGCATATAT UL I U 1 

4000 ........ 

ly«UuAUGLyArgTrpProValLyaTfarIWBiaTfarAapA«nGlySerA«nPhe^ 
AAAATIAGCACGAAGATGGC CAGT AAAAACAATACAT ACACACAATGGCACCAATTTC ACCAGTACTAC C GTTAAG GCCGCCTGTTGCrGGGCGGGAATCAAGCACGAATTTCGAATTCC 
......... MOO 

TyrA*nFroGioSerGlnClyV«lV«LCluSerfetAsaLy*C^ 
CTACAATCCCGUUCTCAACUACTACTACAATCTATCAATAAA^ 

4300 

Hi^aPh^Ly.AxgLyiCIyClytleGlyClvTyrSfrMiCtyCluArglleV.lAipIUlliAUThrAipIUClQThrLyfGluLeuClaLy.ClDlleTbrLyflleClnAio 
CCACAATrTTAAAAGAAAAGG<^X^TTGCCWCTACAGTCCAG 

4400 

Pbe^gValTyrTyrArgAapSerAxgAapProUuTrpiyaClyProAULy^^ 

. _ ORF 0»-CyaGlnClu 

TTTtCGCGTTTATTACACGCACACCAGAGATC CACTTTGCAAAGGACCACCAA AGCTCC1CTCCMACGTGAAGGGCCAGTACTAATACAACATAATACTCACATAAAAGTACTCCCAAC 

*500 ...... 

ArgLyaAULyillelleArgA«pTyrCIyLysClDKetAl«GlyAapA«pCysV«lAUScrArgClQAipCluAfp • 
CWLy«CliiArgSerUuClyn«S&luAjuArgTrpGlDm 

aacaaaaccaaagatcattagccattatcgaaaa^gatggcaggtgatgattctctgc 

^ 00 . . . . . 

GlyLysAlaArgGlyTrpFt^yrAxgHiaHiaTyrG;^^ 

^gccaaacctaggggatcgttttatacacatcactatcaaagccctcatcc 

470C ^X^^^y .yy v..:V > # ^ ^800 ^4* 

tccatacaccagaaacawctcgcatctcc^goga^ ■ 

.A' PCyaPheSt rAjpSerAlalltArgLyiAltUy^ 

ttgactctttttcagactctgctataagaaacgccttattacg^ 

anAAIAACACCAAAAAACATAAACCC^CTTTWCTACOTAC(^AACTCACA^ 

• ■__ " •'■ : ' • . . V ^7?:'^ • - •' i' 5100. I » • . . . ■ .. . > V- / ' - : ■ ' - 

ACTACACCTTTTAGACGAGCTTAAiUATCAACCTC^ACACATTTTCCTACGATTTCC * 

. 5200 ...... 

C^TAATAACAATTCTtXAACAACTGCTCTTTATCCATTTCAGAATTCGCTCTCGA 

* 5300 .r^ *• . . . . ' . . W 5400 

AGCCCTGCAACCATCCACCAAGTCACCCTAAAaCTCCTTGTACCACTTGCTATTC 

■ * • • • • . . . 5500 

CCAACAAGCC(^CACAGCGACGAACACCTCrrCAACCCA(^C^CTCATC^ 

j . * , • . . . 5600 . • '.. 

' ENV^> LyaCluGlnLyaTbr 

IACTAGCAATAATAATAGCAATAGTTCTGTCCTCCATACTAATCATAS^ 

_-— • * • ■ 570C . . . : 

V.lAlag^rgVaUyiGluLy.TyrClnHitUoTrpArgTrpGlyTrpLy^TrpClyThrMetUuUuClylUUuMeaieCyaSirM.ThrGiuL^ 
CTC<XX4TCACAGTCAACCACAAATATCACCACTOTCCACAIGCCCCTCCAMTCG 

?800 

LyiCluAlalhrThrl 
AAGGAAGCAACCACGJ 

5500 . t 5000 

pAipClaScrUuL) 
CGATCAAAGCCTAAi 

w 



tATTATCGCCTACCTCtCTGGAACGAAlXAACCACCACTCTATTTTCTCCATCACATCCTAM^ 

S50 ° ■ 6000 

CCCAACCCACMGAACTAGTATTCCTAAATGTCACAGAAAATTTTAACATGTGGAAAAATC^ 

. . ■ . I 6100 . 

rCTCTAAMTTAACCCCACTCTCTCTTAGTTTAAACTCCACTCATTTCK 



6200 



ATAAAAAACTCCTCTTTCAATATCAGCACAAGCATAACACCTAACCTGCAGAAAGAATATGC^ 

: .... 6300 .... . 

acaagttgtaacacctcactcattacacacgcctctccaaaggtatcctttgacccaattc 

• • - .... 

aatcoaacaggacgatgtacaaatctcagcacactacaatctacacat 

* *5G0 . . . . . . , , 6(00 

f^;t;^;i?^ p ^i! Ly "^ 

TCTCv CAATTTCAC^GACAATCCTAAAACCATAATACTACACCTGAACCAATCT^ ^CTATCCGTATCCACACGCCACCA 

. . . . 6700 

CCCACACCATTICTrACAATACCAAAAATACCAAATATCACACAACCACATTCTAACATTACT 

— 6800 • . , . 

ClyA«^^ytTlttUeI UPhcL yaCUS«rSerGlyClyAapProGlun«ya^^^ 
CGAMTAAIAAAACAATAATCTTTAAGCAAICCTCACCA^ 

_ = - - ■ . 6900 ...... 

Act Ut* UAAT^ACTTC(*CTACTCAACGGTCAAATAACACTCAAfiiUA^ 

7000 
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<XMTACCA CCf II 6»CCIlCCC IT CIICC6 ACa^ 

• 7400 .... 

AATncCTCACCCClATTCACOSCAAauX^^ 

• 7300 .... 

CCCATTTCCCWrtWTCTCCUAACmtTTC^ 

. « . 7600 ... 

^^^^^ 

. . . • . . 8000 

OUCTTATCTCCCACCATCTCCCCACCCTCTCC^^ 

' * * " * • 8100 . . t 

... 8300 • • . 

CloClyAUC7iArsM*IlcArcHitIUProAr|ArgIltArBClnC:jUuClaArcIlcUtil*ra • 

^^^^ 

-t ■ * " ■ '•' • ' * • 8700 * \i ■ ; v -.'-a >*. , ' "i 

• * • l " * 8800 . ; • ■ . ■ >?--'.* '5- • 1 '■■«*.*' • ■ •>-■;•; . 
Glufr8«uV«1^CluTr|^tFbeA < pS«^ * 0 . " 

.CGCTi^C^ 

- • • • . Hmdlll . ■ V ; ! . otnn 

CACCCTCCCACCTCTCTCCCTAACTJUa^ 

• . . . . 9193 

■ * '"V 

Figure 1 Complete DNA Sequence of Viral Genome (LAV-1a) 

The sequence was recon5fnjc!e<* from th* centumr* of phage 1J19 insert. The numbering starts at the cap site, which was located experimentally 
(see above). Important genetic elements, major open reading frames, and their predicted products are indicated together with the Hind III cloning 
sites. The potential glycosylation sites in the env gene are overlined. The NH? terminal sequence of p2oW determined by protein microsequencing 
is boxed (Genetic Systems, personal communication). 

Each nucleotide was sequenced on average 53 times: 85% of the sequence was determined on both strands and the remainder was sequenced 
at least twice from independent clones. The base composition is T. 22.2%; C. 178%; A. 35.8%: G. 24.2%; G + C 42%. The dinucleottde CpG 
is greatly under-represented (03%) as is common among eukaryotic sequences (Bir1, 1980). 



^fhet^flr 

The organization ol a reconstructed LTR and viral flanking 
elements are shown schematically in Figure 3. The LTR is 
638 bp long and displays usual features (Chen and Barker, 
1984): it is bounded by an inverted repeat (5ACTG) includ- 
ing the conserved TG dinudeotide (Temin, 1981); adjacent 
to 5' LTR is the tRNA primer binding site (PBS), com- 
plementary to tRNAf (Raba et al. t 1979); adjacent to 3' 
LTR is a perfect 15 bp polypurine tract. The other three 



polypurine tracts observed between nucleotides 
8200-8800 are not followed by a sequence that is com- 
plementary to that just preceding the PBS. 

The limits of U5, R, and U3 elements were determined 
as follows. U5 is located between PBS and the polyadeny- 
lation site established from the sequence of the 3' end of 
oligo(dT)-primed LAV cDNA (Alizon et al.. 1984). Thus U5 
is 84 bp long. The length of R+U5 was determined by syn- 
thesizing iRNA-piimed LAV cDNA. After alkaline hydroiy- 



Table 1. Locations and Sizes of Viral Open 


Reading Frames 








orf 1* Triplet 


Met 


Stop 


No. Amino Acids 


M, Gate. 


gag 312 


336 


1.836 


50b 


55.841 


pof 1.631 


1.934 


4.640 


(1.003) 


(113,629) 


orf O 4.554 


4.587 


5.163 


192 


22.487 


env 5.746 


5.767 


8.350 


861 - 


97.376 


orf F 8,324 


8.354 


8.972 


206 


23.316 



The nucleotide coordinates refer to the first base ol the first triplet (1 a triplet), of the first methionine (initiation) codon (Met) and of the stop codon 
(Stop). The numbers ol amino acids and molecular weights are those calculated for unmodified precursor products starting at the first methionine 
through to the end. with the exception of pol. where the size and M, refer to that of the whole orf. 
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Figure 2. Comparison of the Genome Organization of LAV with These 
of Human T Cell Leukemia/Lymphoma Virus Type I (HTLV-I) (Seiki et 
al.. 1983). Moloney Murine Leukemia Virus (?4oMuLV) (Shinnick el al., 
1981), and Rous Sarcoma Virus (RSV) (Schwartz et al.. 1983) 
The positions and sizes of viral genes a r a draw, to scale (open boxes) 
and the viral genomes (RNA forms) ar* delimited by brackets. 

sis of the primer, R+U5 was found to be 181 ±1 dp (Fig- 
ure 4). Thus R is 97 bp long and the cap site at its 5' end 
can be located. Finally, U3 is 456 bp Jong. The LAV LTR 
also contains chs'acteri^ic regula^ory elements: a poly- 
adenylation signal sequence AATAAA 19 bp from the R-U5 
junction, arid the ^sequence ATATAAGr which is very likely 
the TATA box, 22 bp 5' of the cap site^There are no long \ 
direct repeats within the LTR/ intere^ihigiy, the LAV LTR 
shows some similarities to that of, the mouse mammary tu- 
mor virus (MMTV) (bonehower et aT, 198 i). They both use 
tRNAf as a primer for (-) strand synthesis, whereas all 
other exogenous mammalian retroviruses known to date 
use tRNAP 10 (Chen and Barker, 1984). They possess, very 
similar polypurine tracts; that of LAV is AAAAGAAAAGG- 
GGGG while that of MMTV is AAAAAAGAAAAAAGGGGG . 
it is probable that the viral (+) strand synthesis is discon- 
tinuous since the polypurine tract flanking the U3 element 
of the 31TR is found exactly duplicated in the 3' end of orf 
pol, at 4331-43461 In addition, MMTV and LAV are excep- 
tional in that the U3 element can encode an orf. In the 
case of MMTV, U3 contains the whole orf while, in LAV, U3 
contains VtO ouuoris uf in© 3* hair oi on' F. 

Viral Proteins 
gag 

Near the 5' extremity of the gag orf is a Typical" initiation 
codon (Kozak, 1984) (position 336), which is not only the 
first in the gag orf, but the first from the cap site. The 
precursor protein is 500 amino acids long. The calculated 
M r of 55341 agrees with the 65 kd gag precursor poly- 
peptide (Luc Montagnier, unpublished results). The N- 
terminal amino acid sequence of the major core protein 
p25, obtained by microsequencing (Genetic Systems, per- 
sonal communication), matches perfectly with the trans- 
lated nucleotide sequence starting from position 732 (see 
Figure 1). This formally makes the link between the cloned 
LAV genome and the immunologically characterized LAV 
p25 protein. The protein encoded 5' of the p25 coding se- 
quence is rather hydrophilic Its calculated Mr of 14,866 is 
consistent with that of the gag protein pia The 3 part of 
the gag region probably codes for the retroviral nucleic 
acid binding protein (NBP). Indeed, as in HTLV-I (Seiki et 
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CAA \ , t RNA 

,\ \aCCGCGGGCui*GuCCCuG 
CAGTGGCGCCCGAACU1GGAC 



Figure a Schematic Representation of the LAV Long Terminal Repeal 
(LTR) 

The LTR was reconstructed from the sequence of U19 by juxtaposing 
the sequences adjacent to the Hind III cloning sites. Sequencing o.* 
oligo(dT>primed LAV DNA done pLAV75 (AUzon et al., 1984) rules out 
the possibttiiy of dustered Hind ill sites in the R region of LAV. LTR are 
limited by an Inverted repeat sequence (IR). Both of the virai elements 
Banking the LTR have been represented as tRNA primer binding site 
(PBS) for 5' LTR and polypurine track (PU) for 3' LTR. Also indicated 
are a putative TATA box, the cap site, potyadenylation signal (AATAAA), 
and polyadenytation site (CAA). The location of the open reading frame 
F (648 nucleotides) is shown above the UR scheme. 

al.. 1983) and RSV (Schwartz et al., 1983); the motif Cys- 
X a -Cys-X 8 -e-Cys common to all NBP (Oroszlan et al.,. 1984) 
is found duplicated (nucleotides 1509 and 1572 i^i-W se- 
quence). Consistent with its function- the putative f NBP is 
/. extremely basic (17% Arg + Ljre)*:^; ; V^^ % ■ ^ 
i v ;poi * / -^j-A^- ■ 

the reverse transcriptase gene can encodee protein of up 
; to 1003 amino acids (calculated ;M r = 113,629). Since the 
first methionine codon is 92 triplets from the origin of the 
' open reading frame, it is possible that the protein is trans- 
lated from a spliced messenger RNA, giving a gag-pol 
polyprotein precursor. 

The pol coding region is tho only one in which signifi- 
cant homology has been found with other retroviral protein 
sequences, three domains of homology being apparent 
The first is a very short region of 17 amino acids (starting 
at 1856). Homologous regions are located within the p15 
gag RSV protease (Dittmar and Moelling, 1978) and a poly- 
peptide encoded by an open reading frame located be- 
tween gag and pol of HTLV-I (Figure 5) (Schwartz et al, 
1933; Seiki ei ai., 1983). This nrsi domain wuki UruS cur- 
respond to a conserved sequence in viral proteases. Its 
different locations within the three genomes may not be 
significant since retroviruses, by splicing or other mecha- 
nisms, express a gag-pol polyprotein precursor (Schwartz 
et al, 1983; Seiki et al., 1983). The second and most ex- 
tensive region of homology (starting at 2048) probably 
represents the core sequence of the reverse transcrip- 
tase. Over a region of 250 amino acids, with only minimal 
insertions or deletions, LAV shows 38% amino acid iden- 
tity with RSV, 25% with HTLV-I. and 21% with MoMuLV 
(Schinnick et at., 1981) while HTLV-I and RSV show 38% 
identity in the same region. A third homologous region is 
situated at the 3' end of the pol reading frame and corre- 
sponds to part of the pp32 peptide of RSV that has ex- 
onuciease activity (Misra et al., 1982). Once again, there 
is greater homology with the corresponding RSV se- 
quence than with HTLV-I. 
enr 

The env open reading frame has a possible initiator 
methionine codon very near the beginning (eighth triplet). 
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Figure 4. Synthesis of RNA-Primed LAV cDNA tor R+U5 (Strong-Stop 
cDNA) 

Lanes 1 and 2 show two different quantities of cONA white lanes M and 
M' represent markers. The strong-slop cDN'A is 1S1 bases long with a 
second, iess intense band at 18a The error of estimation is ±1 bp. This 
maps the major cap site to the second G residue of the sequence 
CTGGGTGT within the LTR, 24 nucleotides downstream of the TATA 
box. This guanosine residue is taken as the first base in the nucleotide 
sequence shown in Figure 1. 

If so, the molecular weight of the presumed env precursor 

UiUltfih {6dt~aTTi iitO d^'tia, M r Colo fat S7J ? 3; ia UUhSBnoMt 

with the known size of the LAV glycoprotein (110 kd and 
90 kd after glycosidase treatment; Luc Montagnier, unpub- 
lished). There are 32 potential N-glycosvlation sites (Asn- 
X-Ser/Thr), which are overlined in Figure 1. An interesting 
feature ol env is the very high number of Tip residues at 
both ends of the protein. There are three hydrophobic 
regions, characteristic of the retroviral envelope proteins 
(Seiki et al.. 1983), corresponding to a signal peptide (en- 
coded by nucleotides 5815-5850 bp), a second region 
{7315-7350 bp), and a transmembrane segment {7831- 
7896 bp). The second hydrophobic region (7315-7350 bp) 
is preceded by a stretch rich in Arg + Lys. It is possible 
that this represents a site of proteolytic cleavage, which, 
by analogy with other retroviral proteins, would give an ex- 
ternal envelope polypeptide and a membrane-associated 
protein (Seiki et al.. 1983; Ktyokawa et al., 1984). A striking 
feature of the LAV envelope protein sequence is that the 
region following the transmembrane segment is of un- 
usual length (150 residues). The env protein shows no 
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Figure 5. Location of a Short Stretch of Homology in the gag-pot Re- 
gion of the LAV. HTLV-J (Seiki et al., 1983) and RSV (Schwartz et a).. 
1983) Genomes 

Conserved amine acids are boxed. Homologous region is shown by 
the solid bar in the schema. Each virus is organized differently in this 
region but the sequence in the RSV genome maps to p15S*o. which 
has a protease-associated function. 

homology to any sequence in protein data banks. The 
small amino acid motif common to the transmembrane 
proteins of all teukernogenic retroviruses (Cianciolo et al. , 
.1984) is, not present in LAV env. 
Qand Kl. ;-. \ ; .... 

The location of orf G is without precedent in the structure 
of retroviruses. Orf F is unique in that it is. half-encoded 
py the U3 element of the LTR. Both ori have strong initiator 
codons (Kozak, 1984) near their 5' ends and can encode 
proteins of 192 amino acids (M r calc = 22,487) and 206 
amino acids (M r calc = 23316), respectively. Both puta- 
tive proteins are hydrophilic (pQ 49% polar, 15.1% Arg + 
Lys; pF 46% polar, 11% Arg + Lys) and are therefore un- 
likeiy to be associated directly with membrane. The func- 
tion for the putative proteins pG and pF cannot be 
'predicted, as no homology was found by screening pro- 
tein sequence data banks. Between orf F and the pX pro- 
tein of HTLV-I there is no detectable homology. Further- 
more, their hydrophobicity/hydrophilicity profiles are 
completely different. It is known that retroviruses can 
transduce cellular genes— notably proto-oncogenes 
(Weinberg r^o*;. We suggest that ons U and F represenT 



exogenous genetic material and not some vestige of cellu- 
lar DNA because LAV ONA does not hybridize to the hu- 
man genome under stringent conditions (Alizon et al., 
1984), and their codon usage is comparable to that of the 
gag, pol, and env genes (data not shown). 

Relationship to Other Retroviruses 

Although LAV is both morphologically and biochemically 
(Barre-Sinoussi et al., 1983) distinct to HTLV-I and -II, it re- 
mained possible that its genome was organized in a simi- 
lar manner. The characteristic features of HTLV-I and -II 
genomes, which they share with the more distantly related 
bovine ieukemia virus (BLV) (Rice et al., 1984), are not 
observed in the case of LAV These are: a region 3' of 
the envelope gene consisting of a noncoding stretch 
(600-900 bp), followed by a coding sequence of 307-357 
codons (X open reading frame), which may slightly over- 
tap the U3 region of the LTR (Seiki et al., 1983; Rice et al., 
1984; Sagata et al., 1984) and. seconi, the LTR being 
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Table 2. Comparison of the Size of the LAV LTR and LTR-Related 
Element to Those of Other Retroviruses 





LTR 


U3 


R 


J5 


PU 


PBS 


IR 


LAV 


638 


456 


97 


85 


15 


LYS 


4 


HTLV-t 


759 


355 




176 


12* 


PRO 


4" 


HTLV-! 


763 


314 


248 


261 


12- 


PRO 


4* 


MMTV 


1.332 


1,197 


11 


124 


19 


LYS 


6 1 


MoMuLV 


594 


449 


68 


77 


13 


PRO 


13 


RSV 


335 


234 


21 


80 


11 


TRP 


15 


SNV 


601 


420 


97 


80 


13 


PRO 


9 



Adapted fror Chen and Barker (1984). 
i = imperfect -match or tract. 

SNV = spleen necrosis virus (Shimetehnc and Temin. 19 



composed of unusually long U5 ?nd R elements and the 
polyadenylation signal being situated in U3 instead of R 
(Seiki et al., 1983; Sagata et al. , 1984; Shimotohono et al. . 
1984). We show here that, in contrast, the 3' end of the LAV 
envelope gene overlaps an open reading frame, termed F, 
that has the coding capacity for 206 amino acids and ex- 
tends within the LTR (110 amirw acicte are encoded by the U3 
region). Tne putatively encoded polypeptide (pF), the pri- 
mary structure of which can be deduced, does not show 
any homology with the theoretical X gene products of the 
HTLV/BLV family. Also, the U5 and R elements are shorter 
(Table 2) and the polyadenytation signal is located within R, 
as is the case for all retroviruses except the HTLV/BLV. Ad- 
ditionally, LAV uses tRNA^ as (-) strand primer, as op- 
posed to tRNAP" 1 employed by all other mammalian retro- 
viruses except MMTV (Donehower et al., 19S1). Those 
homologies detected between the polymerase* and pro- 
tease domains of LAV and HTLV are also found in several 
retroviruses, RSV in particular. 

It has been reported that a cloned HTLV-III genome 
hybridizes (T m = 28°C) to sequences in the gag-pol and 
X regions of HTLV-I and -II; although restriction maps of 
cloned LAV and HTLV-III show almost perfect agreement 
(Hahn et al., 1984), we were unable to detect any such 
hybridization between LAV and HTLV-II fLn = 55°Q 



(Alizon et al, 1984). Indeed, there is a punctual region of 
homology between LAV and HTLV-I (23/27 nucleotides 
starting at position 1859 in the LAV sequence) but nothing 
significant between the two viruses in the X region of 
HTLV-I. One possible reason for this discrepancy <s that 
HTLV-III is subtly different from LAV. However it was sub- 
sequently reported that there was very minimal, if any, ho- 
mology between orf X (of HTLV-I) and HTLV-III (Shaw et al.. 
19P 

Discussion 

Regulatory sequences carried by retroviral LTR are be- 
lieved to be involved in specific interactions between the 
viral genome and the host ce!! (Srinivasan et al., 1934). 
The LTR sequences of LAV are unique among retrovi- 
ruses. That could reflect an original mode of gene ex- 
pression, possibly in relation to particular transcriptional 
factors present in the virus-harboring cell. This hypothesis 
can be tested by studying the regulatory activity of the LAV 



LTR sequences in transient or long-term experiments in- 
volving an indicator gene and different cellular contexts. 

The presence of the Q and F reading frames in addition 
to the conventional gag-pol-env set of genes is unex- 
pected. One should now address the question of their role 
in the viral cycle and pathogenicity by trying to character- 
ize their protein product(s). It is tempting to speculate on 
a role of such polypeptide(s) in T4 cells' mortality, a prob- 
lem that can be studied by designing synthetic peptides 
for antibody production or by using sitendirected mutagen- 
esis of Q and F coding regions. 

The peculiar genetic structure of LAV poses the ques- 
tion of its origin. The virus shares common tracts with other 
(apparently unrelated) retroviruses. For instance, the un- 
usually large size of the outer membrane glycoprotein 
(env) and a comparably sized genome are also observed 
in the case of (antiviruses such as Visna " larris et al., 
1981; Querat et al., 1984). The presence of a large part of 
the F open reading frame in the LTR, and the use of 
tRNAf as a primer for (-) strand synthesis, is reminis- 
cent of the mouse mammary tumor virus. On the other 
hand, homologies in the pol gene would suggest that the 
LAV is closer to RSV than to any other retroviruses; Obvi- 
ously, no clear picture can be drawn from ,the\DNA se- 
quence analysis as far as phytogeny is concerned. Thus, 
it may well be that LAV defines a new group erf reiroviruses 
that have been independently evolving for a considerable 
period of time, and not simply a variant recently derived 
' from a characterized viral family. Both epidemiology and 
pathogeny of AIDS should be reconsidered with this idea 
in mind, when trying to answer such questions as these: 
Are there other human or animal diseases that are as- 
sociated with similarly organized viruses? Is there a precur- 
sor to AIDS-associated virus(es) normally present, in la- 
tent form, in human populations? What triggered in this 
case the recent spreading of pathogenic derivatives? 

Experimental Procedures 



Total JU19 ONA was sonicated, treated with the Klenow fragment of 
DNA polymerase plus deoxyhbonucieottdes (2 hr, 16°C), and fraction- 
ated by agarose get electrophoresis. Fragments of 300-600 bp were 
excised, etectroeJuted, and purified by Etutip (Schleicher and Schuil) 
chromatography. DNA was ethanot-precipitated using 10 pg dextran 
T40 (Pharmacia) as earner and ttgated to dephosphorylated. Sma I- 
cfeaved Ml3mp8 RF DNA using T4 DNA and RNA tigases (16 hr. 16°C) 
and transfected into E. coli strain TG-J. Recombinant clones were de- 
tected by plaque hybridization using the appropriate "relabeled LAV 
restriction fragments as probes. Single-stranded templates were pre- 
pared from plaques exhibiting positive hybridization signals and were 
sequenced by the dideoxy chain termination procedure (Sanger et al., 
1977) using cr-^&dATP (Amersham, 400 Ci/mmol) and buffer gradient 
gets (Bigg en et al.. 1983). Sequences were compiled and analyzed 
using the programs of Stad»n adapted by &. Caudron for the Institut 
Pasteur Computer Center (£ 'ten, 1982). 

Strong-Stop cDNA 

LAV virions from infected T lymphocyte (Barre-Sirtousst et at.. 1983) 
culture supernatant were pelleted through a 20% sucrose cushion and 
the cDNA (-) stand was synthesized as described previously (Alizon 
eta!.. 1984) except that no exogenous primer was used. After alkaline 
hydrolysis (03 M NaOH. 30 min. 65°C), neutralization, and phenol ex- 
traction, the cDNA was ethanol-p*ectpitated and loaded onto a 6% 



Cell 
16 



acrytamide/8 M urea sequencing gel with sequence ladders as size 
markers. 
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